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ABSTRACT 

Context. The identification of increasingly smaller signal from objects observed with a non-perfect instrument in a noisy environment 
poses a challenge for a statistically clean data analysis. 

Aims. We want to compute the probability of frequencies determined in various data sets to be related or not, which cannot be answered 
with a simple comparison of amplitudes. Our method provides a statistical estimator for a given signal with different strengths in a set 
of observations to be of instrumental origin or to be intrinsic. 

Methods. Based on the spectral significance as an unbiased statistical quantity in frequency analysis. Discrete Fourier Transforms 
(DFTs) of target and background light curves are comparatively examined. The individual False-Alarm Probabilities are used to 
deduce conditional probabilities for a peak in a target spectrum to be real in spite of a corresponding peak in the spectrum of a 
background or of comparison stars. Alternatively, we can compute joint probabilities of frequencies to occur in the DFT spectra of 
several data sets simultaneously but with different amplitude, which leads to composed spectral significances. These are useful to 
investigate a star observed in different filters or during several observing runs. The composed spectral significance is a measure for 
the probability that none of coinciding peaks in the DFT spectra under consideration are due to noise. 

Results. Cinderella is a mathematical approach to a general statistical problem. Its potential reaches beyond photometry from ground 
or space: to all cases where a quantitative statistical comparison of periodicities in different data sets is desired. Examples for the 
composed and the conditional Cinderella mode for different observation setups are presented. 

Key words, methods: data analysis - methods: statistical - space vehicles: instruments - techniques: photometric 



1. Introduction 

The micromag precision, achieved by the MOSlQ 
(Microvariability & Oscillations of STars) mission (Walker et 
al. 2003; Matthews 2004), does not only provide exciting new 
results in asteroseismology, but reveals instrumental problems 
which challenge our data reduction techniques (see Sect. ll . lb . 
Cosmic ray impacts on the detector, stray light, positioning 
errors of the satellite, and thermal stability problems introduce 
periodic and, in the worst case, pseudo-periodic effects into 
photometric measurements. All this calls for new techniques in 
data reduction and analysis (see Sect. |1.2t . 

Space observations in general can provide an unprecedented 
amount of measurements, requiring an enhanced degree of auto- 
matic data analysis without sacrificing accuracy and reliability. 
In this context, SigSpec (Reegen 2007) was developed to com- 
bine the Discrete Fourier Transform (DFT) - a standard method 
to determine stellar pulsation frequencies - with a clean statisti- 
cal quantity: the spectral significance of a peak in an amplitude 
or power spectrum by comparison to white noise. 

The basic idea of Cinderella is to use target and compar- 
ison data sets simultaneously for a cross-identification of arti- 
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1 MOST is a Canadian Space Agency mission, jointly operated by 
Dynacon Inc., the University of Toronto Institute of Aerospace Studies, 
the University of British Columbia, and with the assistance of the 
University of Vienna, Austria. 



facts in the frequency domain. It is the first technique permitting 
a statistically unbiased and quantitative comparison of different 
(not necessarily photometric) time series in the frequency do- 
main. Being applicable to practically all measurements of physi- 
cal quantities over time, Cinderella has the potential to become 
a valuable tool beyond the scope of micromag space photometry. 



1.1. The MOST mission 

The first space telescope designed and built for photometric stel- 
lar seismology was EVRIS (Vuillemin et al. 1998), a 10-cm pho- 
toelectric telescope aboard the MARS-96 probe, but it unfortu- 
nately did not achieve the transfer orbit. An instrument providing 
photometric information on a large scale useful for asteroseis- 
mology was NASA's WIRE satellite, whose primary scientific 
goal of infrared mapping failed, but a 5-cm star tracker telescope 
with a CCD detector turned out to permit stellar photometry of 
remarkable quality (e. g., Buzasi et al. 2000). The MOST satel- 
lite launched in June, 2003, assumed the role as a precursor to 
the CNES-led mission COROT (Baglin et al. 2004), which was 
successfully launched on December 27, 2006, and which is pro- 
ducing extremely useful space photometric data of hitherto un- 
precedented accuracy and volume. 

MOST, WIRE and COROT are low-Earth-orbit (LEO) mis- 
sions with comparable environmental effects (e.g., cosmic radi- 
ation, stray light scattered from the Earth's surface). A further 
commonality of all three missions is the requirement to extract 
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Fig. 1. The raw light curve (blue) of the MOST Fabry target 
P CMi and after data reduction (red). Harmonics of the satellite's 
orbital frequency (« 14.2 d~'; dotted green), the detected stellar 
signal (3.257 d -1 & 3.282d~ 1 ; dotted black) are indicated. 



asteroseismic information from a series of up to hundreds of 
thousands of CCD frames (or sub-rasters, respectively), each of 
which may consist of a few hundred to several million pixels. 
Hence, the present work may apply to other LEO space photom- 
etry missions and to ground-based multi-object photometry. 

The MOST telescope is a 15-cm Maksutov optical tele- 
scope, supplied with a single broadband filter and initially with 
two identical CCD detectors: one used for science data acqui- 
sition, the other for the Attitude Control System (ACS). Thanks 
to the low mass of 54 kg and the ACS developed by Dynacon, 
Inc. (Groccott, Zee & Matthews 2003; Carroll, Rucinski & 
Zee 2004), a pointing stability to approximately ±1" rms is 
achieved. 

In Fabry Imaging mode the telescope entrance pupil is im- 
aged onto the CCD via a Fabry microlens as is shown by Figs. 7 
and 8 of Walker et al. (2003). Each Fabry Image is an annulus 
with an outer diameter of 44 pixels. The pixels in a square sub- 
raster outside the annulus are used to estimate the background. 
MOST also obtains Direct Imaging photometry of typically 1-6 
stars, based on defocussed images (FWHM ~ 2.2 pixels; Rowe 
et al. 2006; Huber & Reegen 2008), and Guide Star photometry 
of about 20 - 30 stars (Aerts et al. 2006; Saio et al. 2006). 

1.2. Data reduction 

The data reduction described by Reegen et al. (2006) applies 
linear correlations between pairs of target and background pixels 
for stray light correction. This so-called decorrelation technique 
is also applicable to simultaneous photometry of several stars, in 
this case correlating variable vs. constant stars. 

Fig. [U illustrates the performance of the Fabry imaging pho- 
tometry with MOST data of (3 CMi (Saio et al. 2007). The blue 
graph refers to the raw data and the red graph to the reduced light 
curve. The overall noise level decreased by an order of 10, and 
so did the harmonics of the orbital frequency of the spacecraft, 
(^ 14.2 d" 1 for 101.4min; Walker et al. 2003). However, instru- 
mental peaks (dotted green lines) persisted on a lower level and 
their amplitudes still exceeded the stellar signal (main frequen- 
cies: 3.257 d- 1 & 3.282 d _1 ; dotted black line). 



1.3. SigSpec 

SigSpec (Reegen 2007), is based on DFT amplitude spectra and 
consecutive prewhitening of dominant peaks. But instead of con- 
sidering the peak with the highest amplitude to be significant and 
estimating the reliability roughly in terms of signal-to-noise ra- 
tio, the Probability Density Function (PDF) is employed. The 
PDF depends on the frequency and phase of the examined peak 
using white noise as a reference. The mean photometric magni- 
tude in a time series is usually reduced to zero before evaluating 
the DFT. SigSpec the resulting statistical consequences into ac- 
count, and is furthermore not restricted to Gaussian distributed 
residuals. 

The False-Alarm Probability is a frequently used statistical 
quantity in time series analysis. It is the probability of a peak 
at a given amplitude level to be generated by noise. Formally 
it is obtained through integration of the PDF. To avoid prob- 
lems in computing extremely low numerical values, SigSpec re- 
turns a quantity called spectral significance (hereafter abbrevi- 
ated by "sig"), which is the negative logarithm of the False- 
Alarm Probability. It gives the number of uncorrelated data sets 
needed, containing pure noise, so that a peak in the Fourier do- 
main appears which is comparable in amplitude and phase to the 
peak under consideration in the observed data. 

Although SigSpec prevailed as a powerful tool for analyzing 
MOST photometry, it occasionally suffered from the weakness 
of having to refer to uncorrelated (i.e. white) noise. 

1.4. The virtue of Cinderella 

Frequencies with individual amplitudes and phases ("peaks") in 
the DFT spectra of a target and comparison data sets are exam- 
ined by Cinderella for compatibility. In other words, Cinderella 
allows us to investigate whether these data sets are related by any 
physical (deterministic) process. The procedure is the same if the 
comparison data represent sky background or a star with a dif- 
ferent frequency spectrum as the target star, which - in the best 
case - is a constant star. Subsequently, the terms "target star" and 
"comparison star" will be used, keeping in mind that everything 
discussed here readily applies to sky readings instead of com- 
parison stars as well. Obviously, all compared data sets have to 
be observed under similar circumstances. An extension of the 
method to handle more than one comparison data set is useful 
for multi-object environments, such as photometry in a field. 

In conditional mode, Cinderella establishes a quantitative 
comparison of significant frequencies occurring at the same time 
in at least two different data sets. It returns a statistically robust 
value, called conditional sig, for the probability that a peak in 
the spectrum of one data set is not (deterministically) related to a 
peak in the other data set(s) within a given frequency resolution. 

The alternative composed mode is dedicated to testing 
whether peaks in different DFT spectra with similar frequencies 
are "real", in the sense of not due to noise. The corresponding 
quantity, the composed sig is a measure for the probability that 
none of the examined peaks is due to noise. 

1.5. Frequency resolution 

The question how to set the frequency difference acceptable for 
the consideration of peaks as coincidental is crucial to the exam- 
ination of corresponding peaks in different DFT spectra. In this 
context, an alternative definition to the Rayleigh resolution, 
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with At denoting the total time interval width of the time series is 
introduced by Kallinger, Reegen & Weiss (2007). They suggest 
to additionally employ the sig for a peak amplitude according to 



5f K := 



1 



AfVsig(A) 



(2) 



for obtaining a more realistic criterion for matching peaks (fre- 
quency resolution) than provided by Eq. (HJ. Their numerical 
simulations show an excellent compatibility of this quantity, sub- 
sequently termed Kallinger resolution, to the frequency error de- 
rived by Montgomery & O'Donoghue (1999). 

For practical applications, it is useful to enhance the flexibil- 
ity of Cinderella by introducing an exponent z and to re-define 
the frequency resolution according to 



Sf:-- 



1 



where z usually attains values in the range [0, 1]. The Rayleigh 
resolution is obtained for z — 0, whereas z = 1 yields the 
Kallinger resolution. 

2. Theory 

The theoretical framework of Cinderella presented here con- 
tains a conversion that makes amplitudes in the DFT spectra of 
different datasets comparable, introduces conditional and com- 
posed sig, discusses how to handle peaks in a target dataset with- 
out a corresponding counterpart in the comparison dataset, and 
generalizes the method to multiple comparison datasets. 

2.1. Amplitude transformation between different mean 
magnitudes 

Assuming that stray light artifacts are additive in terms of inten- 
sity, a signal amplitude detected in a comparison data set may 
readily be inherited for a comparison with the target amplitude, if 
intensities were employed for the frequency analysis. The corre- 
sponding magnitude variations appear on a scaling that depends 
on the average magnitude. This is reasonable for instrumental 
effects as well. Let us further assume that mean intensities (/) 
are converted into mean magnitudes (m) according to 



(m) = -2.51og(/> 



(4) 



which holds to a sufficient approximation if the variations are 
small compared to the mean intensity. In strict terms, a geomet- 
rical mean intensity transforms in to an arithmetical mean mag- 
nitude. 

Given a mean magnitude {me) an d a stray-light induced si- 
nusoidal variation with amplitude Ac (in magnitudes), the max- 
imum intensity in the comparison light curve will be 



</ c > + A/ = i(T - 4( < mc >- Ac) . 



(5) 



where (Ic) denotes the mean intensity of the comparison data 
and AI is the intenstity amplitude corresponding to Ac- Thus an 
estimate of the intensity amplitude is obtained by 



AI = 10 



-0.4«m c >-A c ) 



- 10 



-0.4(m c > 



(6) 



This magnitude-intensity transformation of amplitudes uses 
the maximum and mean intensities only. The reason is that vari- 
ations are distorted by the logarithmic scaling, and this distor- 
tion is stronger towards low intensities. Hence the Gaussian error 



propagation (producing symmetric errors only) is not appropri- 
ate, nor is it advisable to encounter the minimum intensity as an 
estimator. Both statements were confirmed by numerical simu- 
lations. 

Since the stray-light induced variation is assumed additive in 
terms of intensity, the maximum target intensity will be 

(I T ) + AI = lQ- 0A < m T) + l Q-0.4«mc>-A c ) _ 1 q-0.4<»i c > > rj\ 

substituting for AI according to Eq. The approximation 
AI 



A T ~ 2.5 log 1 + 



(h) 



(8) 



for the target amplitude corresponding to a comparison ampli- 
tude Ac leads to 



(3) A T * 2.5 log 



1 + 



2Q-0.4«m c )-Ac) _ JQ-0.4<m c > 



10-0.4<m T > 



(9) 



This is an estimator of the amplitude in a target star corre- 
sponding to artificial intensity variations of amplitude Ac in a 
comparison star. 

At this point it has to be emphasized that this is a theoreti- 
cally consistent transformation that will yield a reasonable esti- 
mate in many practical applications. However, the detailed study 
of contaminated measurements may occasionally demand spe- 
cial approaches to the calibration of magnitudes. Such an exam- 
ple is presented in Sect.|3]and discussed in detail therein. 

2.2. Frequency and phase differences 

If a peak in the DFT amplitude spectrum of a comparison dataset 
is found within the Rayleigh or Kallinger frequency resolution, 
respectively, about a target peak, the two considered frequen- 
cies and phases generally do not match perfectly. We know that 
DFT peak amplitudes show systematic deviations for different 
frequencies and phases (e.g. Kovacs 1980), whence a trans- 
formed amplitude At at a frequency oJ and a phase angle 9' 
in Fourier Space need not refer to the same amplitude at the fre- 
quency to and the phase angle of the corresponding target peak. 
However, since all calculations were performed using SigSpec 
(Reegen 2007) and since the amplitudes are optimized by least- 
squares fits, they may be considered free of such effects to a 
satisfactory extent. 

At the present status of our investigations, we omit possible 
effects of frequency and phase lag. Under the condition of the 
same instrumental or environmental process to be responsible 
for both target and comparison signal, the frequencies are ex- 
pected to be equal. In addition, frequency deviations are already 
taken into account for candidate selection. This is why the fre- 
quencies in the target and comparison data are considered equal 
at this stage of calculation. On the other hand, it was pointed 
out by Reegen et al. (2006) that stray light moving over a de- 
tector produces phase differences in the stray light signal mea- 
sured at different positions on the CCD. These phase lags are 
the main constraint to the quality of the data reduction proce- 
dure described there. Hence it definitely makes sense to omit the 
phase information in the technique introduced here and consider 
all signal phases consistently aligned to the phase in the target 
dataset. 



2.3. Conditional spectral significance 

The interesting question is now, "What is the probability that a 
given target peak with an amplitude A (to, 6) is generated by the 
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same process as a transformed comparison peak with an ampli- 
tude Aj (a/, 6')?" The answer may be given in terms of sig. 

According to Sect. |2.1| we may use Aj (u, ff) » Aj (a/, 9'). 
If a comparison of sigs is desired for constant time-domain sam- 
pling, frequency and phase, then the calculations simplify to a 
comparison of signal-to-noise ratios, 



sig (A, a>, 0) 
sig (A T , oj', 6') 



A 
A~ T 



(4) 
<* 2 > ' 



(10) 



1 -0> FA = (1 -4>FAl)(l -4>FA2) 



where (x 2 ^ denotes the variance of the target dataset includ- 
ing the signal itself, and ix^jS is the variance the target dataset 
would have if the amplitude were At instead of A. Annotating 
the variance of the target light curve after prewhitening {xPpj, 
the scaling from A onto At is obtained via the difference of 
variances (x 2 ^ - (x 2 ^, which is a measure for the amount of 
signal prewhitened for an amplitude A. If an amplitude At 
is used instead, the corresponding amount will transform into 

{itf {{x 1 ) ~ { x p})- T^ 11 me variance (-4) immediately evalu- sig (Ai A A 2 ) := -log {l - [l 
ates to 



The difference with respect to Sect. |2.3| is that here none of the 
two time series is treated as a mere comparison dataset. This 
question may, e. g., apply to differential photometry of the same 
target with respect to different comparison stars, or to measure- 
ments of the same target in different years. The considered case 
refers to a logical 'and'. 

Given two statistically independent time series with two 
coincident peaks at sigs sig(Ai), sig(A2), the False- Alarm 
Probability, 3>fai,2 = l(T sig ( Al2 ) of an individual peak is the 
probability that it is generated by noise. The complementary 
probability that the considered peak is true is 1 - 10~ slg 
If the individual components are statistically independent, the 
(joint) probability of all components to be real is the product of 
the individual probabilities, 



(14) 



Consistently, a "joint sig" is introduced as the negative logarithm 
of the total False-Alarm Probability, <I>fa, and in terms of indi- 
vidual sigs, one obtains 



10 



-sig (A,) 



1 - 10 



-sig(A 2 ) 



]).(15) 



<4H4Mf) «->-<4». 

This expression transforms Eq. dTOb into 

(4) 



sig (A, a>, 6) 
sig (A?-, oj',9') 



= 1 + 



A T ' 



(x 2 ) 



(ID 



(12) 



In computational applications, numerical problems may 
come along with a straight-forward implementation of this re- 
lation, namely if 10 _slg(A,) produces an overflow. If sig(A2) is 
high and sig (A\) > sig (A2), then the resulting joint sig will be 
sig(Ai AA2) ~ sig(A2), and the amount of change in sig(A2) 
by the composition with sig (A 1 ) may be calculated by a linear 
estimate according to 

■ , . . , ■ , . , d sig (A 1 A A 2 ) 
sig (A! AA 2 )«sig(A 2 )+ - 



The conditional False-Alarm Probability of producing at 
least an amplitude A, if an amplitude At is presumed, is a frac- 
tion of the corresponding individual False-Alarm Probabilities, 
if the corresponding processes are independent. The sig is de- 
fined as the (negative) logarithm of False-Alarm Probability, 
whence a ratio of False- Alarm Probabilities corresponds to a dif- 
ference of sigs, i. e., we obtain 



rfOpAl 



which evaluates to 
sig (A] AA 2 ) ~ sig(A 2 ) - 



1 



- IO 



Of 



loge . 



(16) 



(17) 



sig (A I A T , u, 6) = sig (A, co, 9) 



1 + 



<4> 
<- 2 > 



(13) 



^%A2 

For <I>fa2 <k 1, we may set — 1 « 5^—, which yields 

sig (Ai A A 2 ) « sig (A 2 ) - io 8 ^)-^^ loge . (18) 

If sig (Ai), sig (A2) differ by e. g. 5, the joint sig will deviate from 
min [sig (Ai) , sig (A2)] in the 5th digit. 

If more than two, say N, time series are examined, Eq. ( fTBI l 
may be generalized to 



17=1 



This is the conditional sig of a target peak with an amplitude 1 n 

A under consideration of a comparison peak with a transformed sig ( A A„) := - log < 1 - J~~[ [l 
amplitude At, where the transformation of the comparison am- ' 
plitude may be performed according to Eq. (O. E. g., a peak with 
a conditional sig of 2 is true despite the given comparison peak 
in 99 out of 100 cases. 

The computation of conditional sigs for multiple comparison 
datasets contains the Cinderella analysis of the target dataset 
under consideration vs. each individual comparison dataset. 
Then the individual conditional sigs may be averaged over all 
comparison datasets. The resulting mean conditional sig and the 
corresponding rms error are reasonable estimators for the over- 
all reliability of a target peak. In practical applications, one will 
trust in a target peak if the mean conditional sig is high, both in 
absolute numbers and in units of rms error. 



10 



-sig(A„)l 



(19) 



In practical applications, the employment of the joint sig as 
an estimator for the reliability of a peak in several different DFT 
spectra simultaneously may lead to very low absolute sig values. 
This becomes evident, if we consider N corresponding peaks at 
the same sig level csig. Then the composed sig evaluates to 



sig(/\A„) = -log[l-(l-10- cslg ) 



(20) 



which consistently decreases with increasing number of datasets 
N. 

Setting csig =: |loge, which is the expected sig for white 
noise (Reegen 2007), Eq. ( fT9l evaluates to 



2.4. Joint distributions 

An alternative question, relevant in some cases, is, "Given two 
independently measured datasets, what is the (joint) probability 
of a coincident peak not to be due to noise in both datasets?" 



sig(/\A„) = - log 



1 - exp 



(21) 



This makes clear that both the sigs of given peaks as well as the 
"noise" in the significance spectrum will consistently decrease 
with the number of employed time series. 
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2.5. Composed spectral significance 

The dependence of the statistical properties of the joint sig on 
the number of datasets is potentially irritating, since it does not 
provide numerical values that can be interpreted at first glance. 
Thus it is convenient to introduce a more intuitive scaling. 
Eq. (|20b may be re-written as 



csig (A„) = - log 



1 _ ^l - 10- si s(AA,) 



(22) 



where csig - the composed sig - is now considered as a function 
of A„. The meaning becomes transparent substituting Eq. (fl9T i 
for sig (/\A n ), which yields 



csig (A„) = - log 



1 - 



V 

' n 



10-sig(A,)] 



(23) 



The composed sig of a sample of corresponding peaks is the 
unique sig level for the individual peaks that would reproduce 
the given joint probability. The advantage of this quantity is that 
it is essentially independent of the number of datasets under con- 
sideration. 
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Fig. 2. Relation between the trust coefficient and the composed 
sig; 12 different constellations of the constant sig levels assigned 
for acceptance (a) and rejection (r) are displayed. The orange 
lines represent the solutions for a — > oo. 



2.6. Trust coefficient 

A related question is, "Given N datasets and an associated com- 
posed sig for a set of corresponding peaks therein, what is the 
fraction of datasets in which the considered peak is significant?" 
Since sig is a floating-point number rather than a binary output 
in the sense of, "This peak is true/false", it does not provide a 
unique basis for the decision whether to consider a given peak 
due to noise. But if we assign two constant sig levels a, r to ac- 
ceptance and rejection of a peak, respectively, Eq. ( fT9] l may be 
written as 



sig 



(/\A n ) = -log[l -(1 - 1CT") M (1 - l(Tf-*] 



(24) 



if M out of the N peaks are accepted. Expressing this relation in 
terms of t? := ~, we obtain 



jj log [l - 10- sig(A ' 4 » ) ] - log(l - 10-0 
log(l - 10-°)- log (1 - 10-0 



(25) 



for the fraction of accepted peaks in the examined sample. The 
function t is called the trust coefficient. It is the fraction of re- 
liable peaks in a sample of N datasets, based on the assumed 
sig levels a for an accepted peak (not due to noise) and r for a 
rejected peak (due to noise). 

Substituting sig (/\ A„) by the right-hand expression in 
Eq. ( fT9b transforms Eq. ( f25l ) into 



I (ill log [l " 10- si ^»>]} - log (1 - 10"0 
log(l - 10"°)- log (1 - 10-0 



(26) 



On the other hand, the trust coefficient is related to the composed 
sig via 



log[l - lO- 081 ^] -log(l - 10-0 
r ~ log(l - 10-°)- log (1 - 10-0 



(27) 



which follows from Eqs. fl23l and (|26| i. Since the composed sig 
is independent of the number of examined spectra, the trust co- 
efficient is independent as well. 



Fig. [2] displays the relation between the trust coefficient and 
the composed sig for altogether 12 parameter combinations 
where a e {1, 1.5,2, 3} and r e {o.l, flog e, 0.5}. For csig (A„) < 
r, the trust coefficient is 0, for csig (A„) > a it is 1. Furthermore, 
for a — » oo, Eq. (|27] i yields 



logfl - lO-^M 

T°° — 1 l - i 

log (1-10-0 



(28) 



which is indicated by the orange lines in the figure. For all three 
values of r, the graphs for and t™ are practically identical. 
Thus, r^j will provide a reasonable estimator for the percent- 
age of significant peaks in a sample in practical applications. 



2. 7. Peaks without coincidences 

The search and comparison of coincident peaks raises the ques- 
tion how to treat signal components that have no counterpart in 
the comparison spectra. According to our present practical ex- 
perience, it is in such cases reasonable to assign a constant sig 
level of | loge ss 0.341 (the expected sig for white noise) to the 
comparison data. Then a target peak, for which no significant 
coincidence is detected, can be compared to the expected value 
for pure noise by default. 



3. Conditional spectral significance applied to 
MOST photometry 



In Sect. 12.31 the conditional sig was introduced as a measure of 
the probability that a specific peak in a DFT spectrum (charac- 
terized by frequency, amplitude and phase) is deterministically 
linked to a peak in another dataset within the frequency resolu- 
tion (Eq. (0). Considering one of the two datasets to represent 
the sky background or a constant comparison star, this concept 
can be used to isolate intrinsic frequencies from instrumental or 
environmental periodicities. If a peak in the target data has a sig- 
nificant counterpart in the comparison data, it is not considered 
intrinsic. If the frequency, phase and amplitude of the signal, the 
time base of the observations, and the noise characteristics are 
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Fig. 3. Cinderella result for/JCMi. The conditional sig was es- 
timated for each peak, referring to the background pixel with 
the highest mean sig between and 50 d~' as the comparison 
dataset. Blue bars indicate frequencies with sig(A[A;r) > 5. 
The red bars represent frequencies also found in the comparison 
dataset with sig (A | A T ) < 5. 
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Fig. 4. Cinderella result for HD 1 14839. A mean conditional sig 
was assessed for each peak by averaging over the conditional 
sigs derived from each individual comparison time series. Blue 
bars indicate frequencies with a mean conditional sig exceeding 
the limit of 5 by more than 3 <x. Frequencies not meeting this 
condition are shown in red color. 



exactly the same in both datasets, the decision is obvious. But 
how shall the general (and typical) case be handled where the 
peaks and the noise are different in the two time series? What if 
the readings are taken at different times, as in the case of single- 
channel photometry? The answer is given by the conditional sig, 
a novel approach to an old problem. Relying on SigSpec, it inher- 
its the substantial advantage of unbiased statistical methodology. 

An application of Cinderella comparing two datasets is pre- 
sented in Sect. B.ll below. 

Multi-object photometry monitoring three or more objects in 
one run builds up a scenario where more information is poten- 
tially available than can be handled by the procedure outlined 
above. If more than one constant star is in the observed sample, 
the comparison of target data with several other time series at 
once is desired. As mentioned in Sect. |2.3l this may be achieved 
by a pairwise comparison of the target dataset vs. each compar- 
ison dataset. Then the arithmetic mean and rms error over all 
the results provide good estimators for the overall reliability of a 
peak in the target spectrum. 

3.1. Single comparison dataset 

The first sample scenario concerns MOST measurements of the 
target star BCMi and of the sky background. The target data 
were reduced according to Reegen et al. (2006). To obtain a most 
restrictive estimate, the "worst" background pixel was used for 
comparison: a significance spectrum for the intensities of each 
pixel over time was calculated, and the mean sig in the range 
from to 50 d~' was used to determine the appropriate pixel. 
We picked the one with the highest mean sig. The frequency res- 
olution was applied according to Eq. ([3]l with z = 0.75. 

After a comparison of significant signal components 
(SigSpec output) in both reduced target and sky background data 
using Cinderella (Fig. [3), the orbital frequency of the spacecraft 
(14.2 d~'), integer multiples and 1 d _1 aliases are outstanding 
with their negative conditional sigs, indicating that these fre- 
quencies are present in both datasets and hence to be consid- 



ered instrumental. In the figure, all peaks with a conditional sig 
sig (A | At) > 5 are displayed in blue color, the rest in red. The 
limit of 5 corresponds to a probability of 10 5 for the target peak 
not to be generated by the same process as the corresponding 
background peak: in one out of 100000 cases, the signal found 
in the background data plus white noise would produce DFT am- 
plitude in the target data at least as high as the given one. 

Of course, a high conditional sig does not definitely rule 
out a peak to be instrumental. It only tells that no sufficient 
indication for a common origin of target and background sig- 
nal at the examined frequency is found. For example, signifi- 
cant orbit-related frequencies may show up for the MOST data 
also in the Cinderella ouput occasionally. This is likely due to 
the fact that the target area is contaminated by stray light more 
severely than the sky background available. For a clear state- 
ment on the intrinsic (stellar) nature of suspicious peaks that 
survive the Cinderella procedure, follow-up measurements are 
indispensible. On the other hand, if there is a peak present in the 
Cinderella output, that has to be ruled out for a good reason, the 
corresponding conditional sig may safely be used as a threshold 
and applied to the entire spectrum. 

Our technique was successfully applied to several MOST tar- 
gets: AQLeo (Gruberbauer et al. 2007), yEqu (Gruberbauer et 
al. 2008), and HR 1217 (Cameron et al. 2008). 

3.2. Multiple comparison datasets 

In some cases multiple comparison datasets are available. MOST 
guide star photometry is a good example. While sky measure- 
ments are not provided in this observation setup, several light 
curves of stars which likely suffer from the same contamination 
by stray light or instrumental trends, are present. However, not 
every single comparison data set is equally affected and we may 
not see each instrumental frequency in each DFT spectrum. If we 
do, the amplitudes (when transformed to some reference mean 
magnitude value) usually vary from object to object, depending 
on the position of the stars on the CCD. Still, if these effects are 
additive in intensities to a first approximation, Cinderella pro- 
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vides the means to cope with such a situation due to the statistical 
nature of the conditional sig. 

The suspected Am star HD 1 14839, a y Dor/5 Set hybrid ob- 
served by MOST using guide star photometry (King et al. 2006), 
is a good example. It shows intrinsic variability in the low- and 
intermediate-frequency band, both of which are usually affected 
by stray light. Since four additional guide stars were observed at 
the same time, we are able to employ our technique. 

In this case, the target dataset was compared to each of the 
comparison datasets according to the procedure discussed in 
Sect. 13. II The conditional sigs of the four Cinderella analyses 
are averaged, and the standard deviation is computed. These two 
quantities are used to form a two-fold criterion for the reliability 
of a target peak. First, a threshold for the conditional sig is de- 
fined. In the present example, it is 5. No peak with a mean con- 
ditional sig below this limit is considered intrinsic. Moreover, 
this threshold has to be exceeded by leer, cr denoting the standard 
deviation and k representing an arbitrarily chosen real number. 
In this case, we use k — 3. Putting it all together, we only rely on 
peaks the mean conditional sigs of which exceed 5 + 3cr. 

Fig-El shows the results, which are in very good agreement 
with King et al. (2006). It has to be pointed out, however, that 
in contrast to their method, no manipulation of data other than 
removal of outliers using 3 cr clipping was performed. The blue 
peaks are considered intrinsic according to the criterion given 
above. Among the red (rejected) peaks, there are some with 
sig (A | At, (d,6) < 5 and even negative conditional sigs, but also 
several peaks where the conditional sigs range up to 100. In these 
cases the scatter of sigs in the comparison spectra is very large. 
Most of the frequencies flagged as artifacts are in the low fre- 
quency region below 1 d -1 , where nothing survives, and close to 
the MOST orbit frequency of 14.2 d _1 . In addition, three peaks 
at 1 1.2, 13.2 and 15.2 d -1 are rejected, which correspond to 1 d 
aliases of the orbit frequency. This aliasing is due to stray light 
undergoing periodic terrestrial albedo variations as the space- 
craft orbits the Earth above the terminator (Reegen et al. 2006). 



Composed spectral significances applied to 
MOST data 



As described in Sect. 1 1.41 the composed sig is a measure of 
the consistency of a signal detected in multiple data sets, al- 
lowing for some mismatch in frequency, amplitude and phase 
(see also Sect.0. This is, for instance, of good use for multi- 
site campaigns, where various instruments with different char- 
acteristics are employed. In the case of MOST data, the com- 
posed sig can be applied to multiple observing runs on the same 
star throughout the lifetime of the mission. Significant frequen- 
cies consistently detected in multiple data sets will also remain 
significant in terms of the composed sig. Peaks which are pro- 
duced by noise will most likely be unique to each observation 
run. Correspondingly, their composed sig will decrease with in- 
creasing number of time series involved. 

In the case of conditional sigs, we have one of the involved 
datasets flagged as target and may search for coincidences using 
the frequency resolution (Eq. (0) about a target frequency. There 
is no such reference for composed sig computation, because all 
datasets are considered to be equivalent. Thus we split the fre- 
quency range of interest into a sequence of frequency bins. In 
our example, the grid of bins is ten times finer than the Rayleigh 
frequency resolution (bin width j^), and consecutive bins do 
not overlap. For each bin, the significance spectra for all time 
series are searched for matching peaks, i. e. peaks that either lie 
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Fig. 5. A comparison of the composed sig and the individual sig 
of five background light curves from the same observing run. 
The gray bars represent an overplot of the significant peaks de- 
tected in all five time series individually, as found by SigSpec. 
The black bars correspond to the results of the composed sig 
analysis described in Sect.[4] 



in the bin or deviate from it by not more than their Kallinger 
resolution. If a time series contributes more than one peak to 
a given bin, only the peak with the highest significance is taken 
into account. Finally, the composed sig is computed for all peaks 
associated with the bin. 

In Fig. [5] we present the SigSpec results of five individual 
sky background time series from the observing run on the open 
cluster NGC752. We extracted the sky background signal of 
five CCD subrasters by selecting pixels which are, to a first ap- 
proximation, not influenced by any stellar PSF Each time series 
was analyzed individually with SigSpec What we expected was 
that in the individual DFT spectra, the stray-light induced orbit 
peaks and their 1 d -1 aliases would be accompanied by spuri- 
ous peaks at lower sig, whereas the composed sigs would pro- 
duce a spectrum that would only contain features that referred to 
long-periodic trends and stray light. The gray graph represents 
an overplot of all five individual significance spectra. Between 
the orbit harmonics and their aliases, lots of peaks are visible. 

The black plot refers to the composition of all five light 
curves. Only long-term trends, common to all five datasets, as 
well as signal corresponding to the orbital frequency of the stray 
light are considered to be significant. Furthermore, 1 d" 1 side- 
lobes of the orbital harmonics are visible, referring to daily stray 
light modulations probably induced by the dependence of the 
terrestrial albedo on the position over the Earth's surface. Other 
frequencies, clearly visible in the significance spectra of the indi- 
vidual time-series, are not consistently detected and are therefore 
regarded as noise. 



5. Conclusions and outreach 

This paper introduces a technique to interpret periodicities in 
an ensemble of data of common origin. Cinderella relies on 
SigSpec (Reegen 2007), thus benefitting from a correct employ- 
ment of the complex phase information in Fourier Space on the 
one hand and a clean statistical description of interrelation of 
datasets on the other. 
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The conditional Cinderella mode is based on a quantitative 
comparison between one target and one or more comparison 
datasets and returns a measure of the probability (conditional 
sig) for periodicities identified in the target data to be determin- 
istically related (to be 'unique') to the target. 

The composed Cinderella analysis returns a measure of the 
joint probability (composed sig) that a given periodicity ob- 
served in individual datasets - but with different signal strengths 
- is not due to noise. Such datasets could contain, e.g., mea- 
surements of the same target in different observing runs or with 
different instruments (e.g., different filters or simultaneous spec- 
troscopy and photometry). 

Our experience (as outlined in our examples in Sect.O con- 
firms that Cinderella reliably identifies residual instrumental 
signal in the MOST data even after a fairly sophisticated data 
reduction in the time-domain and also provides quantitative ar- 
guments to distinguish intrinsic from instrumental signal. 

Cinderella is a statistically correct technique replacing what 
experienced observers achieve based on their "good feelings" 
when evaluating, for example, differential photometry, but, of 
course, the method is not limited to photometry. It quantitatively 
determines conditional and composed probabilities for matching 
peaks in DFT spectra of any kind of datasets containing period- 
icities. 
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